Collaborative filtering

A distinction is often made between two forms of data collection for recommendation systems. Explicit feedback relies on the user giving explicit signals about their preferences i.e. review ratings. Where as, implicit feedback refers to non-explicit signals of preference e.g. user watch-time. Traditionally, recommender systems can be split into three types:

  • Collaborative filtering (CF): CF produces recommendations based on the knowledge of users’ attitudes towards items, that is, it uses the “wisdom of the crowd” to recommend items.

  • Content-based (CB): CB recommender systems focus on the attributes of the items to recommend other items similar to what the user likes, based on their previous actions or explicit feedback.

  • Hybrid recommendation systems: Hybrid methods are a combination of CB recommending and CF methods

In many applications, content-based features are not easy to extract, and thus, collaborative filtering approaches are preferred. Thus, we will only explore collaborative filtering methods from now on.

CF methods typically fall into three types, memory-based, model-based and more recently deep-learning based (Su & Khoshgoftaar, 2009, He et al., 2017). Neighbour-based CF and item-based/user-based top-N recommendations are typical examples of memory-based systems that utilises user rating data to compute the similarity between users or items. As mentioned previously, common model-based approaches include Bayesian networks, latent semantic models and markov decision processes. In this investigation, we will utilise a weighted matrix factorization approach. Later on, we will generalize the matrix factorization algorithm via a non-linear neural architecture (a softmax model).

However, there are a number of limitations to our approaches such as the inability to model the order of interactions. For instance, Markov chain algorithms (Rendle et al., 2010) can not only encode the same information as traditional CF methods but also the order in which user’s interacted with the items. Furthermore, the sparsity of the frequency matrix (described later on), makes computations prohibitly expensive in real-world settings, without some optimization.

Setup

The next few code cells details the initial preparatory steps needed for the development of our collaborative filtering models, namely importing the required libraries; scaling the ids of users and artists;constructing a indicator variable for presence of user-artist interaction;finding the most assigned tag of an artist.

from __future__ import print_function
import numpy as np
import pandas as pd
import collections
from IPython import display
from matplotlib import pyplot as plt
import sklearn
import sklearn.manifold
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
tf.logging.set_verbosity(tf.logging.ERROR)

# Add some convenience functions to Pandas DataFrame.
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.3f}'.format

# Install Altair and activate its colab renderer.
print("Installing Altair...")
!pip install git+git://github.com/altair-viz/altair.git
import altair as alt
alt.data_transformers.enable('default', max_rows=None)
alt.renderers.enable('colab')
print("Done installing Altair.")
2021-11-28 16:47:11.732028: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-28 16:47:11.732074: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Installing Altair...
Collecting git+git://github.com/altair-viz/altair.git
  Cloning git://github.com/altair-viz/altair.git to /tmp/pip-req-build-zf_a_oew
  Running command git clone --filter=blob:none -q git://github.com/altair-viz/altair.git /tmp/pip-req-build-zf_a_oew
  Resolved git://github.com/altair-viz/altair.git to commit a987d04e276106f62d4247ea48a1fcead2d06636
  Installing build dependencies ... ?25l-
 \
 |
 /
 done
?25h  Getting requirements to build wheel ... ?25l-
 done
?25h  Preparing metadata (pyproject.toml) ... ?25l-
 done
?25hRequirement already satisfied: pandas>=0.18 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (1.3.4)
Requirement already satisfied: jsonschema<4.0,>=3.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (3.2.0)
Requirement already satisfied: entrypoints in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (0.3)
Collecting toolz
  Downloading toolz-0.11.2-py3-none-any.whl (55 kB)
?25l

     |█████▉                          | 10 kB 29.4 MB/s eta 0:00:01
     |███████████▊                    | 20 kB 26.4 MB/s eta 0:00:01
     |█████████████████▋              | 30 kB 12.8 MB/s eta 0:00:01
     |███████████████████████▌        | 40 kB 9.7 MB/s eta 0:00:01 
     |█████████████████████████████▍  | 51 kB 4.9 MB/s eta 0:00:01 
     |████████████████████████████████| 55 kB 3.4 MB/s             
?25hRequirement already satisfied: jinja2 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (3.0.3)
Requirement already satisfied: numpy in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from altair==4.2.0.dev0) (1.21.4)
Requirement already satisfied: attrs>=17.4.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (21.2.0)
Requirement already satisfied: six>=1.11.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (1.16.0)
Requirement already satisfied: importlib-metadata in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (4.8.2)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (0.18.0)
Requirement already satisfied: setuptools in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (47.1.0)
Requirement already satisfied: pytz>=2017.3 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from pandas>=0.18->altair==4.2.0.dev0) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from pandas>=0.18->altair==4.2.0.dev0) (2.8.2)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from jinja2->altair==4.2.0.dev0) (2.0.1)
Requirement already satisfied: typing-extensions>=3.6.4 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from importlib-metadata->jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (4.0.0)
Requirement already satisfied: zipp>=0.5 in /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages (from importlib-metadata->jsonschema<4.0,>=3.0->altair==4.2.0.dev0) (3.6.0)
Building wheels for collected packages: altair
  Building wheel for altair (pyproject.toml) ... ?25l-
 \
 |
 /
 done
?25h  Created wheel for altair: filename=altair-4.2.0.dev0-py3-none-any.whl size=812168 sha256=2200f74175a686aeb68aba298411329b3c2213b15833d344df679c160e1f5376
  Stored in directory: /tmp/pip-ephem-wheel-cache-pjjw059f/wheels/06/13/e0/5bd72c969fe3954ee1561739e5c58e2ddfe5c10fcdffb12faa
Successfully built altair
Installing collected packages: toolz, altair
Successfully installed altair-4.2.0.dev0 toolz-0.11.2
Done installing Altair.
# NEEDED FOR GOOGLE COLAB
# from google.colab import auth
#from google.colab import drive
# import gspread
# from oauth2client.client import GoogleCredentials

# drive.mount('/content/drive/')
# os.chdir("/content/drive/My Drive/DCU/fouth_year/advanced_machine_learning/music-recommodation-system")

Helper functions

def calculate_sparsity(M):
    """
    Computes sparsity of frequency matrix
    """
    matrix_size = len((M['userID'].unique())) * len((M['artistID'].unique()))  # Number of possible interactions in the matrix
    num_plays = len(M['weight']) # Number of weights
    sparsity = (float(num_plays/matrix_size))
    return sparsity
def build_music_sparse_tensor(music_df):
  """
  Args:
    ratings_df: a pd.DataFrame with `userID`, `artistID` and `weight` columns.
    num_rows: an integer representing the number of rows in the frequency matrix
    num_rows: an integer representing the number of columns in the frequency matrix
  Returns:
    a tf.SparseTensor representing the feedback matrix.
  """
  indices = music_df[['userID', 'artistID']].values
  values = music_df['weight'].values
  return tf.SparseTensor(
      indices=indices,
      values=values,
      dense_shape=[num_users, num_artist])
def preproces_ids(music_df):
  """
  Args:
    ratings_df: a pd.DataFrame with `userID`, `artistID` and `weight` columns.
  Returns:
    a pd.DataFrame where userIDs and artistIDs now start at 1 
      and end at n and m (defined above), respectively
    two dictionary preserving the orginal ids. 
  """
  unique_user_ids_list = sorted(music_df['userID'].unique())
  print(unique_user_ids_list[0])

  unique_user_ids = dict(zip(range(0, len(unique_user_ids_list) ),unique_user_ids_list))
  unique_user_ids_switched = dict(zip(unique_user_ids_list, range(0, len(unique_user_ids) )))
  
  unique_artist_ids_list = sorted(music_df['artistID'].unique())
  unique_artist_ids = dict(zip(range(0, len(unique_artist_ids_list) ),unique_artist_ids_list))
  unique_artist_ids_switched = dict(zip(unique_artist_ids_list, range(0, len(unique_artist_ids_list) )))

  music_df['userID'] = music_df['userID'].map(unique_user_ids_switched)
  music_df['artistID'] = music_df['artistID'].map(unique_artist_ids_switched)

  return music_df, unique_user_ids, unique_artist_ids
def split_dataframe(df, holdout_fraction=0.1):
  """Splits a DataFrame into training and test sets.
  Args:
    df: a dataframe.
    holdout_fraction: fraction of dataframe rows to use in the test set.
  Returns:
    train: dataframe for training
    test: dataframe for testing
  """
  test = df.sample(frac=holdout_fraction, replace=False)
  train = df[~df.index.isin(test.index)]
  return train, test

Traditional recommender system development relies on explicit feedback. Many models were designed to tackle this issue as a regression problem. For instance, the input of the model would be a matrix \(F_{nm}\) denoting user’s (m) preference of items (n) on a scale. In the classic movie ratings example, this preference would be users giving a 1-to-5 star rating to different movies.

This dataset contains implicit feedback: that is, observed logs of user interactions with items, in this instance user’s listening counts to artists. However, implicit feedback does not signal negativity, in the same way as a 1-star rating would. In our data, a user could listen to song of an artist a limited number of times. But that does not necessarily mean that the particular user has an aversion to that artist i.e. it could be part of a curated playlist by another user. Therefore, we decide to construct a binary matrix, which has a value of one if the observation is observed (i.e. a listening count has been logged between an artist and a user). Note, a 0 is not used to describe unobserved artist-user interactions. This is for optimization reasons, explained below.

user_artists = pd.read_csv('data/user_artists.dat', sep='\t')
user_artists['weight'] = 1
artists = pd.read_csv('data/artists.dat', sep='\t')
artists.rename({'id':'artistID'}, inplace=True, axis=1)
user_taggedartists = pd.read_csv(r'data/user_taggedartists-timestamps.dat', sep='\t')
user_taggedartists_years = pd.read_csv(r'data/user_taggedartists.dat', sep='\t')
tags = pd.read_csv(open('data/tags.dat', errors='replace'), sep='\t')
user_taggedartists = pd.merge(user_taggedartists, tags, on=['tagID'])
num_users = user_artists.userID.nunique()
num_artist = artists.artistID.nunique()
collab_filter_df = user_artists

Here, we calculate the top 10 tags by popularity. Then, we assign it to a artist, if the artist has a top 10 tag. If an artist’s tags are not in the top 10, we input ‘N/A’. Note, the next cell can take several mintues to compute.

top_10_tags = user_taggedartists['tagValue'].value_counts().index[0:10]
user_taggedartists['top10TagValue'] = None
for index, row in user_taggedartists.iterrows():
  if row['tagValue'] in top_10_tags:
    user_taggedartists.iloc[index, -1] = row['tagValue']
user_taggedartists.fillna('N/A',inplace=True)
artists = pd.merge(user_taggedartists, artists, on=['artistID'], how='right')[['artistID','name','top10TagValue','tagValue']].fillna('N/A')
artists.groupby(['artistID','name','top10TagValue']).agg(lambda x:x.value_counts().index[0]).reset_index()
artists = artists.drop_duplicates(subset=['artistID'])
assert artists.artistID.nunique() == num_artist
artists.rename({'tagValue':'mostCommonGenre'},axis=1, inplace=True)

We require two matrices or embeddings to compute a similarity measure (one for quires and one for items), but how do we get these two embeddings?

Matrix Factorisation

Figure 2: Data flow chart

First, we need to contsruct the feedback matrix \(F \in R^{m \times n}\), where \(m\) is the number of users and \(n\) is the number of artists. The goal is to two generate two lower-dimensional matrices \(U_{mp}\) and \(V_{np}\) ( with \(p << m\) and \(p << n\)), representing latent user and artist components, so that: $\( F \approx UV^\top \)$

First,we attempt to build the frequency matrix for both training and testing data. tf.SparseTensor is used for efficient representation. Three separate arguments are used to represent a tensor, namely indices, values, dense_shape, where a value \(A_{ij} = a\) is encoded by setting indices[k] = [i, j] and values[k] = a. The last tensor dense_shape is used to specify the shape of the full underlying matrix. Note, as the indices arguments represent row and columns indices, some pre-processing needs to be performed on artist and user IDs. The IDs should start from 0 and end at \(m-1\) and \(n-1\) for users and artists respectively. Presently, userIDs start at 2. Two dictionaries, orginal_artist_ids, orginal_user_ids will preserve the original ids for analysis purposes later on. Assertions and print statements are used to ensure the validity of the transformations.

colab_filter_df, orginal_user_ids, orginal_artist_ids =  preproces_ids(collab_filter_df)
2
colab_filter_df.describe()
userID artistID weight
count 92834.000 92834.000 92834.000
mean 944.222 3235.737 1.000
std 546.751 4197.217 0.000
min 0.000 0.000 1.000
25% 470.000 430.000 1.000
50% 944.000 1237.000 1.000
75% 1416.000 4266.000 1.000
max 1891.000 17631.000 1.000

Next, we caulcate the number of unique artists, userids and sparisty of our proposed frequency matrix, before splitting into training and test subsets. Quite a sparse matrix indeed!

print(f'Number of unqiue users are: {collab_filter_df["userID"].nunique()}')
print(f'Number of unqiue artists are: {collab_filter_df["artistID"].nunique()}')
print(f'Sparsity of our frequency matrix: {calculate_sparsity(collab_filter_df)}')
Number of unqiue users are: 1892
Number of unqiue artists are: 17632
Sparsity of our frequency matrix: 0.002782815119924182
collab_filter_df.to_csv('data/test_user_artists.csv',index=False)
frequency_m_train, frequency_m_test = split_dataframe(colab_filter_df)
frequency_m_train_tensor  = build_music_sparse_tensor(frequency_m_train)
frequency_m_test_tensor  = build_music_sparse_tensor(frequency_m_test)
assert num_users  == frequency_m_train_tensor.shape.as_list()[0] 
assert num_artist == frequency_m_train_tensor.shape.as_list()[1] 
assert num_users == frequency_m_test_tensor.shape.as_list()[0] 
assert num_artist == frequency_m_test_tensor.shape.as_list()[1] 

Training a Matrix factorization model

Per the definition above, \(UV^\top\) approximates \(F\). The Mean Squared Error is used to measure this approximation error. In the notation below, k is used to represent the set of observed listening counts, and K is the number of observed listening counts.

\[ \begin{align*} \text{MSE}(F, UV^\top) = \frac{1}{K}\sum_{(i, j) \in k}{( F_{ij} - (UV^\top)_{ij})^2} \end{align*} \]

However, rather than computing the full prediction matrix, \(UV^\top\) and gathering the entries in the embeddings (corresponding to the observed listening counts) , we only gather the embeddings of the observers pairs and compute their dot products. Thereby, we reduce the complexity from \(O(NM)\) to \(O(Kp)\) where \(p\) is the embedding dimension. Stochastic gradient descent (SGD) is used to minimize the loss (objective) function. The SDG algorithim cycles through the observed listening binary and caulates the prediction according to the following equation.

\[ e_{ui} = F_{ui} - U_{i}V_{j} \]

Then it updates the user and artist as embeddings as shown in the following equations.

\[ U_{i} \leftarrow U_{i} + \alpha (e_{ui}V_{j} - \beta U_{i}) \]
\[ V_{j} \leftarrow V_{j} + \alpha (e_{ui}U_{j} - \beta V_{j}) \]

where \(\alpha\) denotes the learning rate. The algorithim continues untill convergence is found.

Other matrix factorization algorithms functions are also commonly used such as Alternating Least Squares (Takács and Tikk, 2012). A modified version of the aforementioned function known as Weighted Alternating Least Squares (WALS) is slower than SDG but can be parallelised. For the purposes of this investigation, we are not particularly concerned with training times/latency requirements so we proceed with SDG.

We also decide to add regularization to our model, to avoid overfitting. Overfitting occurs when the model tries to fit the training dataset to hard and does not generalize well to unseen or future data. In the context of artist recommendation, fitting the observed listening counts often emphasizes learning high similarity (between artists with many listeners), but a good embedding representation also requires learning low similarity (between artists with few listeners).

First, we define the two classes train_matrix_norm and build_matrix_norm class. The build_matrix_norm class computes the necessary pre-processing steps before we train the model such as specifying the loss metric to optimise and the loss components( e.g. gravity loss for the regularized model) and the initial artist and user embeddings. train_matrix_norm simply trains the models and outputs figures detailing the the loss metrics and components. The methods build_vanilla() and build_reg_model() computes the necessary pre-processing steps for the non-regularized and regularized model.

### Training a Matrix Factorization model
class train_matrix_norm(object):
  """Simple class that represents a matrix normalisation model"""
  def __init__(self, embedding_vars, loss, metrics=None):
    """Initializes a Matrix normalisation model 
    Args:
      embedding_vars: A dictionary of tf.Variables.
      loss: A float Tensor. The loss to optimize.
      metrics: optional list of dictionaries of Tensors. The metrics in each
        dictionary will be plotted in a separate figure during training.
    """
    self._embedding_vars = embedding_vars
    self._loss = loss
    self._metrics = metrics
    self._embeddings = {k: None for k in embedding_vars}
    self._session = None


  @property
  def embeddings(self):
    """The embeddings dictionary."""
    return self._embeddings


  def train(self, num_iterations=100, learning_rate=1.0, plot_results=True,
            optimizer=tf.train.GradientDescentOptimizer):
    """Trains the model.
    Args:
      iterations: number of iterations to run.
      learning_rate: optimizer learning rate.
      plot_results: whether to plot the results at the end of training.
      optimizer: the optimizer to use. Default to SDG
    Returns:
      The metrics dictionary evaluated at the last iteration.
    """
    with self._loss.graph.as_default():
      opt = optimizer(learning_rate)
      train_op = opt.minimize(self._loss)
      local_init_op = tf.group(
          tf.variables_initializer(opt.variables()),
          tf.local_variables_initializer())
      if self._session is None:
        self._session = tf.Session()
        with self._session.as_default():
          self._session.run(tf.global_variables_initializer())
          self._session.run(tf.tables_initializer())
          tf.train.start_queue_runners()

    with self._session.as_default():
      local_init_op.run()
      iterations = []
      metrics = self._metrics or ({},)
      metrics_vals = [collections.defaultdict(list) for _ in self._metrics]

      # Train and append results.
      for i in range(num_iterations + 1):
        _, results = self._session.run((train_op, metrics))
        if (i % 10 == 0) or i == num_iterations:
          print("\r iteration %d: " % i + ", ".join(
                ["%s=%f" % (k, v) for r in results for k, v in r.items()]),
                end='')
          iterations.append(i)
          for metric_val, result in zip(metrics_vals, results):
            for k, v in result.items():
              metric_val[k].append(v)

      for k, v in self._embedding_vars.items():
        self._embeddings[k] = v.eval()

      if plot_results:
        # Plot the metrics.
        num_subplots = len(metrics)+1
        fig = plt.figure()
        fig.set_size_inches(num_subplots*10, 8)
        for i, metric_vals in enumerate(metrics_vals):
          ax = fig.add_subplot(1, num_subplots, i+1)
          for k, v in metric_vals.items():
            ax.plot(iterations, v, label=k)
          ax.set_xlim([1, num_iterations])
          ax.legend()
      return results

class build_matrix_norm():
  """Simple class that represents a matrix normalisation model"""
  def __init__(self, listens, embedding_dim=3, regularization_coeff=.1, gravity_coeff=1.,
    init_stddev=0.1):
    """Initializes a Matrix normalisation model 
    Args:
      listens: the DataFrame of artist listening counts.
      embedding_dim: The dimension of the embedding space.
      regularization_coeff: The regularization coefficient lambda.
      gravity_coeff: The gravity regularization coefficient lambda_g.
    Returns:
      A train_matrix_norm object that uses a regularized loss.
  """
    self._embedding_vars = embedding_vars
    self._loss = loss
    self._metrics = metrics
    self._embeddings = {k: None for k in embedding_vars}
    self._session = None

  def sparse_mean_square_error(sparse_listens, user_embeddings, artist_embeddings):
    """
    Args:
      sparse_listens: A SparseTensor rating matrix, of dense_shape [N, M]
      user_embeddings: A dense Tensor U of shape [N, k] where k is the embedding
        dimension, such that U_i is the embedding of user i.
      artist_embeddings: A dense Tensor V of shape [M, k] where k is the embedding
        dimension, such that V_j is the embedding of movie j.
    Returns:
      A scalar Tensor representing the MSE between the true ratings and the
        model's predictions.
    """
    predictions = tf.gather_nd(
        tf.matmul(user_embeddings, artist_embeddings, transpose_b=True),
        sparse_listens.indices)
    loss = tf.losses.mean_squared_error(sparse_listens.values, predictions)
    return loss
    
  def gravity(U, V):
    """Creates a gravity loss given two embedding matrices."""
    return 1. / (U.shape[0].value*V.shape[0].value) * tf.reduce_sum(
        tf.matmul(U, U, transpose_a=True) * tf.matmul(V, V, transpose_a=True))
  
  def build_vanilla(embedding_dim=3, init_stddev=1.):
    """performs the necessary preprocessing steps for the regularized model.  """ 
    # Initialize the embeddings using a normal distribution.
    U = tf.Variable(tf.random.normal(
      [frequency_m_train_tensor.dense_shape[0], embedding_dim], stddev=init_stddev))
    V = tf.Variable(tf.random.normal(
      [frequency_m_train_tensor.dense_shape[1], embedding_dim], stddev=init_stddev))
    
    embeddings = {"userID": U, "artistID": V}
    error_train = build_matrix_norm.sparse_mean_square_error(frequency_m_train_tensor, U, V)
    error_test = build_matrix_norm.sparse_mean_square_error(frequency_m_test_tensor, U, V)
    metrics = {
        'train_error': error_train,
        'test_error': error_test
    }
    return train_matrix_norm(embeddings, error_train, [metrics])


  def build_reg_model(embedding_dim=3, regularization_coeff=.1, gravity_coeff=1.,
  init_stddev=0.1
  ):
    """performs the necessary preprocessing steps for the regularized model.  """ 
    U = tf.Variable(tf.random.normal(
      [frequency_m_train_tensor.dense_shape[0], embedding_dim], stddev=init_stddev))
    V = tf.Variable(tf.random.normal(
      [frequency_m_train_tensor.dense_shape[1], embedding_dim], stddev=init_stddev))
  
    embeddings = {"userID": U, "artistID": V}

    error_train = build_matrix_norm.sparse_mean_square_error(frequency_m_train_tensor, U, V)
    error_test = build_matrix_norm.sparse_mean_square_error(frequency_m_test_tensor, U, V)
    gravity_loss = gravity_coeff * build_matrix_norm.gravity(U, V)
    regularization_loss = regularization_coeff * (
      tf.reduce_sum(U*U)/U.shape[0].value + tf.reduce_sum(V*V)/V.shape[0].value)
    total_loss = error_train + regularization_loss + gravity_loss
    losses = {
      'train_error_observed': error_train,
      'test_error_observed': error_test,
    }
    loss_components = {
      'observed_loss': error_train,
      'regularization_loss': regularization_loss,
      'gravity_loss': gravity_loss,
    }
    #embeddings = {"userID": U, "artistID": V}

    return train_matrix_norm(embeddings, total_loss, [losses, loss_components])

Vanilla Model (non-regularized)

vanilla_model = build_matrix_norm.build_vanilla(embedding_dim=35,init_stddev=.05)
vanilla_model.train(num_iterations=2000, learning_rate=20.)
2021-11-28 16:50:10.238874: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-11-28 16:50:10.238921: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-11-28 16:50:10.238952: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az36-794): /proc/driver/nvidia/version does not exist
2021-11-28 16:50:10.239270: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
 iteration 0: train_error=1.000092, test_error=1.000133
 iteration 10: train_error=0.998419, test_error=1.000140
 iteration 20: train_error=0.996698, test_error=1.000117
 iteration 30: train_error=0.994837, test_error=0.999989
 iteration 40: train_error=0.992698, test_error=0.999643
 iteration 50: train_error=0.990050, test_error=0.998876
 iteration 60: train_error=0.986483, test_error=0.997317
 iteration 70: train_error=0.981267, test_error=0.994288
 iteration 80: train_error=0.973147, test_error=0.988601
 iteration 90: train_error=0.960161, test_error=0.978365
 iteration 100: train_error=0.939833, test_error=0.961110
 iteration 110: train_error=0.910385, test_error=0.934878
 iteration 120: train_error=0.872778, test_error=0.900229
 iteration 130: train_error=0.830779, test_error=0.860610
 iteration 140: train_error=0.787586, test_error=0.819304
 iteration 150: train_error=0.744099, test_error=0.777406
 iteration 160: train_error=0.700736, test_error=0.735334
 iteration 170: train_error=0.658518, test_error=0.694088
 iteration 180: train_error=0.618513, test_error=0.654851
 iteration 190: train_error=0.581331, test_error=0.618411
 iteration 200: train_error=0.547145, test_error=0.585066
 iteration 210: train_error=0.515875, test_error=0.554784
 iteration 220: train_error=0.487320, test_error=0.527362
 iteration 230: train_error=0.461230, test_error=0.502530
 iteration 240: train_error=0.437345, test_error=0.480006
 iteration 250: train_error=0.415418, test_error=0.459522
 iteration 260: train_error=0.395227, test_error=0.440836
 iteration 270: train_error=0.376569, test_error=0.423736
 iteration 280: train_error=0.359270, test_error=0.408034
 iteration 290: train_error=0.343174, test_error=0.393571
 iteration 300: train_error=0.328149, test_error=0.380208
 iteration 310: train_error=0.314081, test_error=0.367827
 iteration 320: train_error=0.300871, test_error=0.356327
 iteration 330: train_error=0.288434, test_error=0.345620
 iteration 340: train_error=0.276697, test_error=0.335633
 iteration 350: train_error=0.265599, test_error=0.326301
 iteration 360: train_error=0.255086, test_error=0.317568
 iteration 370: train_error=0.245109, test_error=0.309385
 iteration 380: train_error=0.235627, test_error=0.301710
 iteration 390: train_error=0.226604, test_error=0.294502
 iteration 400: train_error=0.218006, test_error=0.287726
 iteration 410: train_error=0.209805, test_error=0.281352
 iteration 420: train_error=0.201974, test_error=0.275350
 iteration 430: train_error=0.194487, test_error=0.269693
 iteration 440: train_error=0.187323, test_error=0.264356
 iteration 450: train_error=0.180462, test_error=0.259318
 iteration 460: train_error=0.173885, test_error=0.254556
 iteration 470: train_error=0.167574, test_error=0.250051
 iteration 480: train_error=0.161514, test_error=0.245787
 iteration 490: train_error=0.155690, test_error=0.241745
 iteration 500: train_error=0.150089, test_error=0.237911
 iteration 510: train_error=0.144699, test_error=0.234271
 iteration 520: train_error=0.139507, test_error=0.230812
 iteration 530: train_error=0.134505, test_error=0.227522
 iteration 540: train_error=0.129683, test_error=0.224390
 iteration 550: train_error=0.125031, test_error=0.221407
 iteration 560: train_error=0.120542, test_error=0.218561
 iteration 570: train_error=0.116209, test_error=0.215847
 iteration 580: train_error=0.112025, test_error=0.213254
 iteration 590: train_error=0.107983, test_error=0.210777
 iteration 600: train_error=0.104079, test_error=0.208409
 iteration 610: train_error=0.100307, test_error=0.206143
 iteration 620: train_error=0.096661, test_error=0.203973
 iteration 630: train_error=0.093138, test_error=0.201896
 iteration 640: train_error=0.089733, test_error=0.199905
 iteration 650: train_error=0.086443, test_error=0.197996
 iteration 660: train_error=0.083264, test_error=0.196166
 iteration 670: train_error=0.080192, test_error=0.194409
 iteration 680: train_error=0.077225, test_error=0.192723
 iteration 690: train_error=0.074359, test_error=0.191104
 iteration 700: train_error=0.071591, test_error=0.189550
 iteration 710: train_error=0.068919, test_error=0.188056
 iteration 720: train_error=0.066340, test_error=0.186620
 iteration 730: train_error=0.063851, test_error=0.185240
 iteration 740: train_error=0.061450, test_error=0.183913
 iteration 750: train_error=0.059135, test_error=0.182636
 iteration 760: train_error=0.056903, test_error=0.181409
 iteration 770: train_error=0.054752, test_error=0.180227
 iteration 780: train_error=0.052680, test_error=0.179091
 iteration 790: train_error=0.050684, test_error=0.177996
 iteration 800: train_error=0.048763, test_error=0.176943
 iteration 810: train_error=0.046913, test_error=0.175928
 iteration 820: train_error=0.045134, test_error=0.174951
 iteration 830: train_error=0.043422, test_error=0.174010
 iteration 840: train_error=0.041776, test_error=0.173104
 iteration 850: train_error=0.040194, test_error=0.172230
 iteration 860: train_error=0.038673, test_error=0.171388
 iteration 870: train_error=0.037212, test_error=0.170577
 iteration 880: train_error=0.035808, test_error=0.169795
 iteration 890: train_error=0.034460, test_error=0.169041
 iteration 900: train_error=0.033166, test_error=0.168313
 iteration 910: train_error=0.031923, test_error=0.167612
 iteration 920: train_error=0.030730, test_error=0.166935
 iteration 930: train_error=0.029585, test_error=0.166282
 iteration 940: train_error=0.028486, test_error=0.165652
 iteration 950: train_error=0.027431, test_error=0.165044
 iteration 960: train_error=0.026418, test_error=0.164458
 iteration 970: train_error=0.025447, test_error=0.163891
 iteration 980: train_error=0.024515, test_error=0.163344
 iteration 990: train_error=0.023620, test_error=0.162816
 iteration 1000: train_error=0.022762, test_error=0.162306
 iteration 1010: train_error=0.021938, test_error=0.161813
 iteration 1020: train_error=0.021147, test_error=0.161337
 iteration 1030: train_error=0.020388, test_error=0.160876
 iteration 1040: train_error=0.019660, test_error=0.160432
 iteration 1050: train_error=0.018961, test_error=0.160002
 iteration 1060: train_error=0.018289, test_error=0.159586
 iteration 1070: train_error=0.017645, test_error=0.159184
 iteration 1080: train_error=0.017026, test_error=0.158795
 iteration 1090: train_error=0.016431, test_error=0.158419
 iteration 1100: train_error=0.015860, test_error=0.158056
 iteration 1110: train_error=0.015312, test_error=0.157704
 iteration 1120: train_error=0.014785, test_error=0.157363
 iteration 1130: train_error=0.014278, test_error=0.157033
 iteration 1140: train_error=0.013792, test_error=0.156714
 iteration 1150: train_error=0.013324, test_error=0.156405
 iteration 1160: train_error=0.012874, test_error=0.156106
 iteration 1170: train_error=0.012442, test_error=0.155816
 iteration 1180: train_error=0.012026, test_error=0.155535
 iteration 1190: train_error=0.011626, test_error=0.155263
 iteration 1200: train_error=0.011241, test_error=0.155000
 iteration 1210: train_error=0.010871, test_error=0.154744
 iteration 1220: train_error=0.010515, test_error=0.154496
 iteration 1230: train_error=0.010172, test_error=0.154256
 iteration 1240: train_error=0.009843, test_error=0.154023
 iteration 1250: train_error=0.009525, test_error=0.153798
 iteration 1260: train_error=0.009220, test_error=0.153579
 iteration 1270: train_error=0.008925, test_error=0.153366
 iteration 1280: train_error=0.008642, test_error=0.153160
 iteration 1290: train_error=0.008369, test_error=0.152960
 iteration 1300: train_error=0.008106, test_error=0.152766
 iteration 1310: train_error=0.007853, test_error=0.152577
 iteration 1320: train_error=0.007609, test_error=0.152394
 iteration 1330: train_error=0.007374, test_error=0.152217
 iteration 1340: train_error=0.007148, test_error=0.152044
 iteration 1350: train_error=0.006929, test_error=0.151876
 iteration 1360: train_error=0.006719, test_error=0.151713
 iteration 1370: train_error=0.006516, test_error=0.151555
 iteration 1380: train_error=0.006320, test_error=0.151401
 iteration 1390: train_error=0.006132, test_error=0.151252
 iteration 1400: train_error=0.005950, test_error=0.151106
 iteration 1410: train_error=0.005774, test_error=0.150965
 iteration 1420: train_error=0.005605, test_error=0.150828
 iteration 1430: train_error=0.005442, test_error=0.150694
 iteration 1440: train_error=0.005284, test_error=0.150564
 iteration 1450: train_error=0.005132, test_error=0.150437
 iteration 1460: train_error=0.004986, test_error=0.150314
 iteration 1470: train_error=0.004844, test_error=0.150194
 iteration 1480: train_error=0.004708, test_error=0.150077
 iteration 1490: train_error=0.004576, test_error=0.149964
 iteration 1500: train_error=0.004448, test_error=0.149853
 iteration 1510: train_error=0.004325, test_error=0.149745
 iteration 1520: train_error=0.004207, test_error=0.149640
 iteration 1530: train_error=0.004092, test_error=0.149538
 iteration 1540: train_error=0.003981, test_error=0.149439
 iteration 1550: train_error=0.003874, test_error=0.149341
 iteration 1560: train_error=0.003771, test_error=0.149247
 iteration 1570: train_error=0.003671, test_error=0.149154
 iteration 1580: train_error=0.003574, test_error=0.149064
 iteration 1590: train_error=0.003481, test_error=0.148977
 iteration 1600: train_error=0.003391, test_error=0.148891
 iteration 1610: train_error=0.003303, test_error=0.148808
 iteration 1620: train_error=0.003219, test_error=0.148726
 iteration 1630: train_error=0.003137, test_error=0.148647
 iteration 1640: train_error=0.003059, test_error=0.148569
 iteration 1650: train_error=0.002982, test_error=0.148493
 iteration 1660: train_error=0.002908, test_error=0.148419
 iteration 1670: train_error=0.002837, test_error=0.148347
 iteration 1680: train_error=0.002768, test_error=0.148276
 iteration 1690: train_error=0.002701, test_error=0.148208
 iteration 1700: train_error=0.002636, test_error=0.148140
 iteration 1710: train_error=0.002574, test_error=0.148074
 iteration 1720: train_error=0.002513, test_error=0.148010
 iteration 1730: train_error=0.002455, test_error=0.147947
 iteration 1740: train_error=0.002398, test_error=0.147886
 iteration 1750: train_error=0.002343, test_error=0.147826
 iteration 1760: train_error=0.002289, test_error=0.147767
 iteration 1770: train_error=0.002238, test_error=0.147710
 iteration 1780: train_error=0.002188, test_error=0.147653
 iteration 1790: train_error=0.002139, test_error=0.147598
 iteration 1800: train_error=0.002092, test_error=0.147545
 iteration 1810: train_error=0.002047, test_error=0.147492
 iteration 1820: train_error=0.002003, test_error=0.147440
 iteration 1830: train_error=0.001960, test_error=0.147390
 iteration 1840: train_error=0.001918, test_error=0.147341
 iteration 1850: train_error=0.001878, test_error=0.147292
 iteration 1860: train_error=0.001839, test_error=0.147245
 iteration 1870: train_error=0.001801, test_error=0.147198
 iteration 1880: train_error=0.001764, test_error=0.147153
 iteration 1890: train_error=0.001729, test_error=0.147108
 iteration 1900: train_error=0.001694, test_error=0.147065
 iteration 1910: train_error=0.001661, test_error=0.147022
 iteration 1920: train_error=0.001628, test_error=0.146980
 iteration 1930: train_error=0.001596, test_error=0.146939
 iteration 1940: train_error=0.001566, test_error=0.146898
 iteration 1950: train_error=0.001536, test_error=0.146859
 iteration 1960: train_error=0.001507, test_error=0.146820
 iteration 1970: train_error=0.001479, test_error=0.146782
 iteration 1980: train_error=0.001451, test_error=0.146744
 iteration 1990: train_error=0.001425, test_error=0.146708
 iteration 2000: train_error=0.001399, test_error=0.146672
[{'train_error': 0.001399054, 'test_error': 0.14667167}]
_images/Collaborative_filtering_27_203.png

Regularized moodel

reg_model = build_matrix_norm.build_reg_model(regularization_coeff=0.1, gravity_coeff=1.0, embedding_dim=35,init_stddev=.05)
reg_model.train(num_iterations=2000, learning_rate=20.)
 iteration 0: train_error_observed=1.000257, test_error_observed=1.000239, observed_loss=1.000257, regularization_loss=0.017490, gravity_loss=0.000218
 iteration 10: train_error_observed=0.998593, test_error_observed=1.000192, observed_loss=0.998593, regularization_loss=0.017086, gravity_loss=0.000208
 iteration 20: train_error_observed=0.996935, test_error_observed=1.000085, observed_loss=0.996935, regularization_loss=0.016736, gravity_loss=0.000200
 iteration 30: train_error_observed=0.995165, test_error_observed=0.999823, observed_loss=0.995165, regularization_loss=0.016439, gravity_loss=0.000192
 iteration 40: train_error_observed=0.993103, test_error_observed=0.999250, observed_loss=0.993103, regularization_loss=0.016196, gravity_loss=0.000187
 iteration 50: train_error_observed=0.990439, test_error_observed=0.998086, observed_loss=0.990439, regularization_loss=0.016016, gravity_loss=0.000182
 iteration 60: train_error_observed=0.986627, test_error_observed=0.995825, observed_loss=0.986627, regularization_loss=0.015914, gravity_loss=0.000180
 iteration 70: train_error_observed=0.980719, test_error_observed=0.991574, observed_loss=0.980719, regularization_loss=0.015922, gravity_loss=0.000181
 iteration 80: train_error_observed=0.971191, test_error_observed=0.983868, observed_loss=0.971191, regularization_loss=0.016096, gravity_loss=0.000187
 iteration 90: train_error_observed=0.955955, test_error_observed=0.970661, observed_loss=0.955955, regularization_loss=0.016521, gravity_loss=0.000204
 iteration 100: train_error_observed=0.933076, test_error_observed=0.949953, observed_loss=0.933076, regularization_loss=0.017308, gravity_loss=0.000242
 iteration 110: train_error_observed=0.902439, test_error_observed=0.921380, observed_loss=0.902439, regularization_loss=0.018544, gravity_loss=0.000323
 iteration 120: train_error_observed=0.866792, test_error_observed=0.887395, observed_loss=0.866792, regularization_loss=0.020221, gravity_loss=0.000470
 iteration 130: train_error_observed=0.829598, test_error_observed=0.851456, observed_loss=0.829598, regularization_loss=0.022226, gravity_loss=0.000702
 iteration 140: train_error_observed=0.792147, test_error_observed=0.815091, observed_loss=0.792147, regularization_loss=0.024442, gravity_loss=0.001028
 iteration 150: train_error_observed=0.754171, test_error_observed=0.778140, observed_loss=0.754171, regularization_loss=0.026822, gravity_loss=0.001457
 iteration 160: train_error_observed=0.715819, test_error_observed=0.740664, observed_loss=0.715819, regularization_loss=0.029361, gravity_loss=0.002007
 iteration 170: train_error_observed=0.678035, test_error_observed=0.703526, observed_loss=0.678035, regularization_loss=0.032036, gravity_loss=0.002696
 iteration 180: train_error_observed=0.641905, test_error_observed=0.667851, observed_loss=0.641905, regularization_loss=0.034793, gravity_loss=0.003528
 iteration 190: train_error_observed=0.608154, test_error_observed=0.634479, observed_loss=0.608154, regularization_loss=0.037566, gravity_loss=0.004496
 iteration 200: train_error_observed=0.577080, test_error_observed=0.603814, observed_loss=0.577080, regularization_loss=0.040296, gravity_loss=0.005584
 iteration 210: train_error_observed=0.548696, test_error_observed=0.575925, observed_loss=0.548696, regularization_loss=0.042943, gravity_loss=0.006775
 iteration 220: train_error_observed=0.522863, test_error_observed=0.550692, observed_loss=0.522863, regularization_loss=0.045483, gravity_loss=0.008049
 iteration 230: train_error_observed=0.499380, test_error_observed=0.527907, observed_loss=0.499380, regularization_loss=0.047901, gravity_loss=0.009389
 iteration 240: train_error_observed=0.478023, test_error_observed=0.507334, observed_loss=0.478023, regularization_loss=0.050193, gravity_loss=0.010779
 iteration 250: train_error_observed=0.458575, test_error_observed=0.488740, observed_loss=0.458575, regularization_loss=0.052356, gravity_loss=0.012205
 iteration 260: train_error_observed=0.440831, test_error_observed=0.471906, observed_loss=0.440831, regularization_loss=0.054393, gravity_loss=0.013655
 iteration 270: train_error_observed=0.424607, test_error_observed=0.456633, observed_loss=0.424607, regularization_loss=0.056308, gravity_loss=0.015118
 iteration 280: train_error_observed=0.409737, test_error_observed=0.442743, observed_loss=0.409737, regularization_loss=0.058105, gravity_loss=0.016584
 iteration 290: train_error_observed=0.396073, test_error_observed=0.430079, observed_loss=0.396073, regularization_loss=0.059791, gravity_loss=0.018047
 iteration 300: train_error_observed=0.383485, test_error_observed=0.418504, observed_loss=0.383485, regularization_loss=0.061372, gravity_loss=0.019498
 iteration 310: train_error_observed=0.371857, test_error_observed=0.407896, observed_loss=0.371857, regularization_loss=0.062853, gravity_loss=0.020934
 iteration 320: train_error_observed=0.361089, test_error_observed=0.398151, observed_loss=0.361089, regularization_loss=0.064241, gravity_loss=0.022348
 iteration 330: train_error_observed=0.351092, test_error_observed=0.389177, observed_loss=0.351092, regularization_loss=0.065543, gravity_loss=0.023738
 iteration 340: train_error_observed=0.341788, test_error_observed=0.380893, observed_loss=0.341788, regularization_loss=0.066764, gravity_loss=0.025101
 iteration 350: train_error_observed=0.333110, test_error_observed=0.373232, observed_loss=0.333110, regularization_loss=0.067909, gravity_loss=0.026433
 iteration 360: train_error_observed=0.324998, test_error_observed=0.366132, observed_loss=0.324998, regularization_loss=0.068984, gravity_loss=0.027734
 iteration 370: train_error_observed=0.317401, test_error_observed=0.359540, observed_loss=0.317401, regularization_loss=0.069994, gravity_loss=0.029001
 iteration 380: train_error_observed=0.310273, test_error_observed=0.353409, observed_loss=0.310273, regularization_loss=0.070944, gravity_loss=0.030234
 iteration 390: train_error_observed=0.303572, test_error_observed=0.347700, observed_loss=0.303572, regularization_loss=0.071837, gravity_loss=0.031431
 iteration 400: train_error_observed=0.297264, test_error_observed=0.342374, observed_loss=0.297264, regularization_loss=0.072678, gravity_loss=0.032593
 iteration 410: train_error_observed=0.291315, test_error_observed=0.337400, observed_loss=0.291315, regularization_loss=0.073470, gravity_loss=0.033718
 iteration 420: train_error_observed=0.285698, test_error_observed=0.332749, observed_loss=0.285698, regularization_loss=0.074218, gravity_loss=0.034806
 iteration 430: train_error_observed=0.280386, test_error_observed=0.328395, observed_loss=0.280386, regularization_loss=0.074923, gravity_loss=0.035858
 iteration 440: train_error_observed=0.275356, test_error_observed=0.324315, observed_loss=0.275356, regularization_loss=0.075590, gravity_loss=0.036874
 iteration 450: train_error_observed=0.270587, test_error_observed=0.320487, observed_loss=0.270587, regularization_loss=0.076220, gravity_loss=0.037853
 iteration 460: train_error_observed=0.266059, test_error_observed=0.316892, observed_loss=0.266059, regularization_loss=0.076817, gravity_loss=0.038796
 iteration 470: train_error_observed=0.261756, test_error_observed=0.313513, observed_loss=0.261756, regularization_loss=0.077383, gravity_loss=0.039704
 iteration 480: train_error_observed=0.257661, test_error_observed=0.310333, observed_loss=0.257661, regularization_loss=0.077919, gravity_loss=0.040576
 iteration 490: train_error_observed=0.253759, test_error_observed=0.307339, observed_loss=0.253759, regularization_loss=0.078429, gravity_loss=0.041414
 iteration 500: train_error_observed=0.250036, test_error_observed=0.304516, observed_loss=0.250036, regularization_loss=0.078914, gravity_loss=0.042219
 iteration 510: train_error_observed=0.246481, test_error_observed=0.301852, observed_loss=0.246481, regularization_loss=0.079377, gravity_loss=0.042990
 iteration 520: train_error_observed=0.243082, test_error_observed=0.299337, observed_loss=0.243082, regularization_loss=0.079818, gravity_loss=0.043728
 iteration 530: train_error_observed=0.239828, test_error_observed=0.296958, observed_loss=0.239828, regularization_loss=0.080240, gravity_loss=0.044436
 iteration 540: train_error_observed=0.236709, test_error_observed=0.294708, observed_loss=0.236709, regularization_loss=0.080643, gravity_loss=0.045112
 iteration 550: train_error_observed=0.233716, test_error_observed=0.292577, observed_loss=0.233716, regularization_loss=0.081031, gravity_loss=0.045758
 iteration 560: train_error_observed=0.230841, test_error_observed=0.290557, observed_loss=0.230841, regularization_loss=0.081403, gravity_loss=0.046375
 iteration 570: train_error_observed=0.228076, test_error_observed=0.288641, observed_loss=0.228076, regularization_loss=0.081761, gravity_loss=0.046964
 iteration 580: train_error_observed=0.225414, test_error_observed=0.286821, observed_loss=0.225414, regularization_loss=0.082107, gravity_loss=0.047525
 iteration 590: train_error_observed=0.222848, test_error_observed=0.285091, observed_loss=0.222848, regularization_loss=0.082442, gravity_loss=0.048059
 iteration 600: train_error_observed=0.220371, test_error_observed=0.283446, observed_loss=0.220371, regularization_loss=0.082766, gravity_loss=0.048568
 iteration 610: train_error_observed=0.217979, test_error_observed=0.281879, observed_loss=0.217979, regularization_loss=0.083081, gravity_loss=0.049053
 iteration 620: train_error_observed=0.215666, test_error_observed=0.280386, observed_loss=0.215666, regularization_loss=0.083387, gravity_loss=0.049513
 iteration 630: train_error_observed=0.213426, test_error_observed=0.278962, observed_loss=0.213426, regularization_loss=0.083686, gravity_loss=0.049950
 iteration 640: train_error_observed=0.211256, test_error_observed=0.277602, observed_loss=0.211256, regularization_loss=0.083978, gravity_loss=0.050364
 iteration 650: train_error_observed=0.209150, test_error_observed=0.276304, observed_loss=0.209150, regularization_loss=0.084264, gravity_loss=0.050757
 iteration 660: train_error_observed=0.207105, test_error_observed=0.275062, observed_loss=0.207105, regularization_loss=0.084545, gravity_loss=0.051130
 iteration 670: train_error_observed=0.205118, test_error_observed=0.273874, observed_loss=0.205118, regularization_loss=0.084821, gravity_loss=0.051482
 iteration 680: train_error_observed=0.203184, test_error_observed=0.272736, observed_loss=0.203184, regularization_loss=0.085093, gravity_loss=0.051815
 iteration 690: train_error_observed=0.201301, test_error_observed=0.271645, observed_loss=0.201301, regularization_loss=0.085362, gravity_loss=0.052129
 iteration 700: train_error_observed=0.199465, test_error_observed=0.270599, observed_loss=0.199465, regularization_loss=0.085628, gravity_loss=0.052426
 iteration 710: train_error_observed=0.197675, test_error_observed=0.269596, observed_loss=0.197675, regularization_loss=0.085891, gravity_loss=0.052706
 iteration 720: train_error_observed=0.195927, test_error_observed=0.268631, observed_loss=0.195927, regularization_loss=0.086152, gravity_loss=0.052969
 iteration 730: train_error_observed=0.194218, test_error_observed=0.267705, observed_loss=0.194218, regularization_loss=0.086411, gravity_loss=0.053216
 iteration 740: train_error_observed=0.192548, test_error_observed=0.266813, observed_loss=0.192548, regularization_loss=0.086669, gravity_loss=0.053448
 iteration 750: train_error_observed=0.190913, test_error_observed=0.265955, observed_loss=0.190913, regularization_loss=0.086927, gravity_loss=0.053665
 iteration 760: train_error_observed=0.189312, test_error_observed=0.265129, observed_loss=0.189312, regularization_loss=0.087183, gravity_loss=0.053869
 iteration 770: train_error_observed=0.187743, test_error_observed=0.264333, observed_loss=0.187743, regularization_loss=0.087439, gravity_loss=0.054059
 iteration 780: train_error_observed=0.186204, test_error_observed=0.263565, observed_loss=0.186204, regularization_loss=0.087695, gravity_loss=0.054236
 iteration 790: train_error_observed=0.184694, test_error_observed=0.262824, observed_loss=0.184694, regularization_loss=0.087950, gravity_loss=0.054400
 iteration 800: train_error_observed=0.183211, test_error_observed=0.262108, observed_loss=0.183211, regularization_loss=0.088206, gravity_loss=0.054553
 iteration 810: train_error_observed=0.181755, test_error_observed=0.261417, observed_loss=0.181755, regularization_loss=0.088462, gravity_loss=0.054694
 iteration 820: train_error_observed=0.180323, test_error_observed=0.260749, observed_loss=0.180323, regularization_loss=0.088719, gravity_loss=0.054824
 iteration 830: train_error_observed=0.178915, test_error_observed=0.260104, observed_loss=0.178915, regularization_loss=0.088977, gravity_loss=0.054944
 iteration 840: train_error_observed=0.177530, test_error_observed=0.259479, observed_loss=0.177530, regularization_loss=0.089235, gravity_loss=0.055054
 iteration 850: train_error_observed=0.176166, test_error_observed=0.258874, observed_loss=0.176166, regularization_loss=0.089494, gravity_loss=0.055154
 iteration 860: train_error_observed=0.174823, test_error_observed=0.258288, observed_loss=0.174823, regularization_loss=0.089754, gravity_loss=0.055245
 iteration 870: train_error_observed=0.173500, test_error_observed=0.257721, observed_loss=0.173500, regularization_loss=0.090015, gravity_loss=0.055326
 iteration 880: train_error_observed=0.172196, test_error_observed=0.257170, observed_loss=0.172196, regularization_loss=0.090277, gravity_loss=0.055400
 iteration 890: train_error_observed=0.170911, test_error_observed=0.256637, observed_loss=0.170911, regularization_loss=0.090540, gravity_loss=0.055465
 iteration 900: train_error_observed=0.169643, test_error_observed=0.256119, observed_loss=0.169643, regularization_loss=0.090804, gravity_loss=0.055522
 iteration 910: train_error_observed=0.168393, test_error_observed=0.255617, observed_loss=0.168393, regularization_loss=0.091070, gravity_loss=0.055572
 iteration 920: train_error_observed=0.167159, test_error_observed=0.255129, observed_loss=0.167159, regularization_loss=0.091336, gravity_loss=0.055615
 iteration 930: train_error_observed=0.165941, test_error_observed=0.254656, observed_loss=0.165941, regularization_loss=0.091604, gravity_loss=0.055651
 iteration 940: train_error_observed=0.164739, test_error_observed=0.254195, observed_loss=0.164739, regularization_loss=0.091872, gravity_loss=0.055680
 iteration 950: train_error_observed=0.163552, test_error_observed=0.253748, observed_loss=0.163552, regularization_loss=0.092142, gravity_loss=0.055703
 iteration 960: train_error_observed=0.162380, test_error_observed=0.253313, observed_loss=0.162380, regularization_loss=0.092413, gravity_loss=0.055720
 iteration 970: train_error_observed=0.161223, test_error_observed=0.252890, observed_loss=0.161223, regularization_loss=0.092684, gravity_loss=0.055731
 iteration 980: train_error_observed=0.160079, test_error_observed=0.252479, observed_loss=0.160079, regularization_loss=0.092957, gravity_loss=0.055737
 iteration 990: train_error_observed=0.158949, test_error_observed=0.252079, observed_loss=0.158949, regularization_loss=0.093230, gravity_loss=0.055738
 iteration 1000: train_error_observed=0.157832, test_error_observed=0.251689, observed_loss=0.157832, regularization_loss=0.093504, gravity_loss=0.055734
 iteration 1010: train_error_observed=0.156729, test_error_observed=0.251310, observed_loss=0.156729, regularization_loss=0.093779, gravity_loss=0.055724
 iteration 1020: train_error_observed=0.155638, test_error_observed=0.250941, observed_loss=0.155638, regularization_loss=0.094055, gravity_loss=0.055711
 iteration 1030: train_error_observed=0.154560, test_error_observed=0.250581, observed_loss=0.154560, regularization_loss=0.094331, gravity_loss=0.055693
 iteration 1040: train_error_observed=0.153495, test_error_observed=0.250230, observed_loss=0.153495, regularization_loss=0.094608, gravity_loss=0.055671
 iteration 1050: train_error_observed=0.152442, test_error_observed=0.249889, observed_loss=0.152442, regularization_loss=0.094885, gravity_loss=0.055645
 iteration 1060: train_error_observed=0.151401, test_error_observed=0.249556, observed_loss=0.151401, regularization_loss=0.095162, gravity_loss=0.055615
 iteration 1070: train_error_observed=0.150371, test_error_observed=0.249231, observed_loss=0.150371, regularization_loss=0.095440, gravity_loss=0.055582
 iteration 1080: train_error_observed=0.149354, test_error_observed=0.248915, observed_loss=0.149354, regularization_loss=0.095718, gravity_loss=0.055545
 iteration 1090: train_error_observed=0.148348, test_error_observed=0.248606, observed_loss=0.148348, regularization_loss=0.095997, gravity_loss=0.055505
 iteration 1100: train_error_observed=0.147354, test_error_observed=0.248305, observed_loss=0.147354, regularization_loss=0.096275, gravity_loss=0.055462
 iteration 1110: train_error_observed=0.146371, test_error_observed=0.248012, observed_loss=0.146371, regularization_loss=0.096553, gravity_loss=0.055416
 iteration 1120: train_error_observed=0.145399, test_error_observed=0.247725, observed_loss=0.145399, regularization_loss=0.096831, gravity_loss=0.055367
 iteration 1130: train_error_observed=0.144438, test_error_observed=0.247446, observed_loss=0.144438, regularization_loss=0.097109, gravity_loss=0.055316
 iteration 1140: train_error_observed=0.143488, test_error_observed=0.247173, observed_loss=0.143488, regularization_loss=0.097387, gravity_loss=0.055262
 iteration 1150: train_error_observed=0.142550, test_error_observed=0.246907, observed_loss=0.142550, regularization_loss=0.097664, gravity_loss=0.055206
 iteration 1160: train_error_observed=0.141621, test_error_observed=0.246647, observed_loss=0.141621, regularization_loss=0.097941, gravity_loss=0.055148
 iteration 1170: train_error_observed=0.140704, test_error_observed=0.246394, observed_loss=0.140704, regularization_loss=0.098217, gravity_loss=0.055087
 iteration 1180: train_error_observed=0.139797, test_error_observed=0.246146, observed_loss=0.139797, regularization_loss=0.098492, gravity_loss=0.055025
 iteration 1190: train_error_observed=0.138901, test_error_observed=0.245904, observed_loss=0.138901, regularization_loss=0.098767, gravity_loss=0.054961
 iteration 1200: train_error_observed=0.138015, test_error_observed=0.245668, observed_loss=0.138015, regularization_loss=0.099042, gravity_loss=0.054895
 iteration 1210: train_error_observed=0.137139, test_error_observed=0.245438, observed_loss=0.137139, regularization_loss=0.099315, gravity_loss=0.054827
 iteration 1220: train_error_observed=0.136274, test_error_observed=0.245213, observed_loss=0.136274, regularization_loss=0.099587, gravity_loss=0.054758
 iteration 1230: train_error_observed=0.135418, test_error_observed=0.244993, observed_loss=0.135418, regularization_loss=0.099859, gravity_loss=0.054687
 iteration 1240: train_error_observed=0.134573, test_error_observed=0.244778, observed_loss=0.134573, regularization_loss=0.100129, gravity_loss=0.054615
 iteration 1250: train_error_observed=0.133738, test_error_observed=0.244568, observed_loss=0.133738, regularization_loss=0.100398, gravity_loss=0.054541
 iteration 1260: train_error_observed=0.132912, test_error_observed=0.244363, observed_loss=0.132912, regularization_loss=0.100666, gravity_loss=0.054467
 iteration 1270: train_error_observed=0.132096, test_error_observed=0.244163, observed_loss=0.132096, regularization_loss=0.100933, gravity_loss=0.054391
 iteration 1280: train_error_observed=0.131290, test_error_observed=0.243967, observed_loss=0.131290, regularization_loss=0.101199, gravity_loss=0.054314
 iteration 1290: train_error_observed=0.130493, test_error_observed=0.243776, observed_loss=0.130493, regularization_loss=0.101463, gravity_loss=0.054237
 iteration 1300: train_error_observed=0.129706, test_error_observed=0.243589, observed_loss=0.129706, regularization_loss=0.101726, gravity_loss=0.054158
 iteration 1310: train_error_observed=0.128928, test_error_observed=0.243406, observed_loss=0.128928, regularization_loss=0.101987, gravity_loss=0.054079
 iteration 1320: train_error_observed=0.128160, test_error_observed=0.243228, observed_loss=0.128160, regularization_loss=0.102247, gravity_loss=0.053998
 iteration 1330: train_error_observed=0.127401, test_error_observed=0.243053, observed_loss=0.127401, regularization_loss=0.102505, gravity_loss=0.053917
 iteration 1340: train_error_observed=0.126651, test_error_observed=0.242883, observed_loss=0.126651, regularization_loss=0.102762, gravity_loss=0.053836
 iteration 1350: train_error_observed=0.125909, test_error_observed=0.242716, observed_loss=0.125909, regularization_loss=0.103017, gravity_loss=0.053754
 iteration 1360: train_error_observed=0.125177, test_error_observed=0.242554, observed_loss=0.125177, regularization_loss=0.103271, gravity_loss=0.053671
 iteration 1370: train_error_observed=0.124454, test_error_observed=0.242394, observed_loss=0.124454, regularization_loss=0.103523, gravity_loss=0.053588
 iteration 1380: train_error_observed=0.123739, test_error_observed=0.242239, observed_loss=0.123739, regularization_loss=0.103773, gravity_loss=0.053504
 iteration 1390: train_error_observed=0.123033, test_error_observed=0.242087, observed_loss=0.123033, regularization_loss=0.104021, gravity_loss=0.053420
 iteration 1400: train_error_observed=0.122335, test_error_observed=0.241938, observed_loss=0.122335, regularization_loss=0.104268, gravity_loss=0.053335
 iteration 1410: train_error_observed=0.121646, test_error_observed=0.241793, observed_loss=0.121646, regularization_loss=0.104513, gravity_loss=0.053251
 iteration 1420: train_error_observed=0.120965, test_error_observed=0.241651, observed_loss=0.120965, regularization_loss=0.104756, gravity_loss=0.053166
 iteration 1430: train_error_observed=0.120292, test_error_observed=0.241512, observed_loss=0.120292, regularization_loss=0.104997, gravity_loss=0.053080
 iteration 1440: train_error_observed=0.119627, test_error_observed=0.241376, observed_loss=0.119627, regularization_loss=0.105237, gravity_loss=0.052995
 iteration 1450: train_error_observed=0.118970, test_error_observed=0.241244, observed_loss=0.118970, regularization_loss=0.105474, gravity_loss=0.052909
 iteration 1460: train_error_observed=0.118322, test_error_observed=0.241114, observed_loss=0.118322, regularization_loss=0.105710, gravity_loss=0.052823
 iteration 1470: train_error_observed=0.117681, test_error_observed=0.240987, observed_loss=0.117681, regularization_loss=0.105944, gravity_loss=0.052738
 iteration 1480: train_error_observed=0.117047, test_error_observed=0.240863, observed_loss=0.117047, regularization_loss=0.106176, gravity_loss=0.052652
 iteration 1490: train_error_observed=0.116421, test_error_observed=0.240742, observed_loss=0.116421, regularization_loss=0.106406, gravity_loss=0.052566
 iteration 1500: train_error_observed=0.115803, test_error_observed=0.240624, observed_loss=0.115803, regularization_loss=0.106635, gravity_loss=0.052480
 iteration 1510: train_error_observed=0.115192, test_error_observed=0.240508, observed_loss=0.115192, regularization_loss=0.106861, gravity_loss=0.052394
 iteration 1520: train_error_observed=0.114589, test_error_observed=0.240395, observed_loss=0.114589, regularization_loss=0.107086, gravity_loss=0.052308
 iteration 1530: train_error_observed=0.113992, test_error_observed=0.240284, observed_loss=0.113992, regularization_loss=0.107309, gravity_loss=0.052222
 iteration 1540: train_error_observed=0.113403, test_error_observed=0.240176, observed_loss=0.113403, regularization_loss=0.107529, gravity_loss=0.052136
 iteration 1550: train_error_observed=0.112821, test_error_observed=0.240070, observed_loss=0.112821, regularization_loss=0.107748, gravity_loss=0.052050
 iteration 1560: train_error_observed=0.112245, test_error_observed=0.239966, observed_loss=0.112245, regularization_loss=0.107966, gravity_loss=0.051965
 iteration 1570: train_error_observed=0.111677, test_error_observed=0.239865, observed_loss=0.111677, regularization_loss=0.108181, gravity_loss=0.051879
 iteration 1580: train_error_observed=0.111115, test_error_observed=0.239766, observed_loss=0.111115, regularization_loss=0.108394, gravity_loss=0.051794
 iteration 1590: train_error_observed=0.110559, test_error_observed=0.239670, observed_loss=0.110559, regularization_loss=0.108606, gravity_loss=0.051709
 iteration 1600: train_error_observed=0.110011, test_error_observed=0.239575, observed_loss=0.110011, regularization_loss=0.108816, gravity_loss=0.051624
 iteration 1610: train_error_observed=0.109468, test_error_observed=0.239483, observed_loss=0.109468, regularization_loss=0.109024, gravity_loss=0.051539
 iteration 1620: train_error_observed=0.108932, test_error_observed=0.239392, observed_loss=0.108932, regularization_loss=0.109230, gravity_loss=0.051455
 iteration 1630: train_error_observed=0.108402, test_error_observed=0.239304, observed_loss=0.108402, regularization_loss=0.109434, gravity_loss=0.051370
 iteration 1640: train_error_observed=0.107879, test_error_observed=0.239218, observed_loss=0.107879, regularization_loss=0.109637, gravity_loss=0.051286
 iteration 1650: train_error_observed=0.107361, test_error_observed=0.239133, observed_loss=0.107361, regularization_loss=0.109838, gravity_loss=0.051202
 iteration 1660: train_error_observed=0.106850, test_error_observed=0.239051, observed_loss=0.106850, regularization_loss=0.110037, gravity_loss=0.051119
 iteration 1670: train_error_observed=0.106344, test_error_observed=0.238970, observed_loss=0.106344, regularization_loss=0.110234, gravity_loss=0.051036
 iteration 1680: train_error_observed=0.105844, test_error_observed=0.238891, observed_loss=0.105844, regularization_loss=0.110429, gravity_loss=0.050953
 iteration 1690: train_error_observed=0.105350, test_error_observed=0.238814, observed_loss=0.105350, regularization_loss=0.110623, gravity_loss=0.050870
 iteration 1700: train_error_observed=0.104861, test_error_observed=0.238739, observed_loss=0.104861, regularization_loss=0.110815, gravity_loss=0.050788
 iteration 1710: train_error_observed=0.104378, test_error_observed=0.238665, observed_loss=0.104378, regularization_loss=0.111006, gravity_loss=0.050705
 iteration 1720: train_error_observed=0.103901, test_error_observed=0.238593, observed_loss=0.103901, regularization_loss=0.111195, gravity_loss=0.050624
 iteration 1730: train_error_observed=0.103429, test_error_observed=0.238523, observed_loss=0.103429, regularization_loss=0.111382, gravity_loss=0.050542
 iteration 1740: train_error_observed=0.102962, test_error_observed=0.238454, observed_loss=0.102962, regularization_loss=0.111567, gravity_loss=0.050461
 iteration 1750: train_error_observed=0.102501, test_error_observed=0.238387, observed_loss=0.102501, regularization_loss=0.111751, gravity_loss=0.050380
 iteration 1760: train_error_observed=0.102044, test_error_observed=0.238321, observed_loss=0.102044, regularization_loss=0.111933, gravity_loss=0.050300
 iteration 1770: train_error_observed=0.101593, test_error_observed=0.238257, observed_loss=0.101593, regularization_loss=0.112114, gravity_loss=0.050220
 iteration 1780: train_error_observed=0.101147, test_error_observed=0.238195, observed_loss=0.101147, regularization_loss=0.112293, gravity_loss=0.050140
 iteration 1790: train_error_observed=0.100706, test_error_observed=0.238133, observed_loss=0.100706, regularization_loss=0.112471, gravity_loss=0.050061
 iteration 1800: train_error_observed=0.100269, test_error_observed=0.238074, observed_loss=0.100269, regularization_loss=0.112647, gravity_loss=0.049982
 iteration 1810: train_error_observed=0.099838, test_error_observed=0.238015, observed_loss=0.099838, regularization_loss=0.112821, gravity_loss=0.049903
 iteration 1820: train_error_observed=0.099411, test_error_observed=0.237958, observed_loss=0.099411, regularization_loss=0.112994, gravity_loss=0.049825
 iteration 1830: train_error_observed=0.098989, test_error_observed=0.237902, observed_loss=0.098989, regularization_loss=0.113166, gravity_loss=0.049747
 iteration 1840: train_error_observed=0.098571, test_error_observed=0.237848, observed_loss=0.098571, regularization_loss=0.113336, gravity_loss=0.049669
 iteration 1850: train_error_observed=0.098158, test_error_observed=0.237795, observed_loss=0.098158, regularization_loss=0.113504, gravity_loss=0.049592
 iteration 1860: train_error_observed=0.097750, test_error_observed=0.237743, observed_loss=0.097750, regularization_loss=0.113671, gravity_loss=0.049515
 iteration 1870: train_error_observed=0.097346, test_error_observed=0.237692, observed_loss=0.097346, regularization_loss=0.113837, gravity_loss=0.049439
 iteration 1880: train_error_observed=0.096946, test_error_observed=0.237643, observed_loss=0.096946, regularization_loss=0.114001, gravity_loss=0.049363
 iteration 1890: train_error_observed=0.096550, test_error_observed=0.237594, observed_loss=0.096550, regularization_loss=0.114164, gravity_loss=0.049287
 iteration 1900: train_error_observed=0.096159, test_error_observed=0.237547, observed_loss=0.096159, regularization_loss=0.114325, gravity_loss=0.049212
 iteration 1910: train_error_observed=0.095772, test_error_observed=0.237501, observed_loss=0.095772, regularization_loss=0.114486, gravity_loss=0.049137
 iteration 1920: train_error_observed=0.095389, test_error_observed=0.237456, observed_loss=0.095389, regularization_loss=0.114644, gravity_loss=0.049063
 iteration 1930: train_error_observed=0.095010, test_error_observed=0.237412, observed_loss=0.095010, regularization_loss=0.114802, gravity_loss=0.048989
 iteration 1940: train_error_observed=0.094635, test_error_observed=0.237369, observed_loss=0.094635, regularization_loss=0.114958, gravity_loss=0.048915
 iteration 1950: train_error_observed=0.094264, test_error_observed=0.237328, observed_loss=0.094264, regularization_loss=0.115113, gravity_loss=0.048842
 iteration 1960: train_error_observed=0.093897, test_error_observed=0.237287, observed_loss=0.093897, regularization_loss=0.115266, gravity_loss=0.048769
 iteration 1970: train_error_observed=0.093534, test_error_observed=0.237247, observed_loss=0.093534, regularization_loss=0.115419, gravity_loss=0.048696
 iteration 1980: train_error_observed=0.093175, test_error_observed=0.237208, observed_loss=0.093175, regularization_loss=0.115570, gravity_loss=0.048624
 iteration 1990: train_error_observed=0.092819, test_error_observed=0.237171, observed_loss=0.092819, regularization_loss=0.115720, gravity_loss=0.048552
 iteration 2000: train_error_observed=0.092467, test_error_observed=0.237134, observed_loss=0.092467, regularization_loss=0.115868, gravity_loss=0.048481
[{'train_error_observed': 0.092467055, 'test_error_observed': 0.23713374},
 {'observed_loss': 0.092467055,
  'regularization_loss': 0.115868166,
  'gravity_loss': 0.048480757}]
_images/Collaborative_filtering_30_202.png

In both models, we observe a steep loss in train error and test as the model progress. Although, the regularized model has a higher MSE, both on the training and test set. It must be noted that the quality of recommendation is improved when regularization is added, which is proven when the artist_neighbors() function is utilized. In addition, we observe in the end evaluation section, that the the performance of the model is improved when regularization is added. The test error decreases similarity to the test error, although it plateaus around the 1000 epoch mark. As expected, the the additional loss generated by the regularization functions increases over epochs. We add the following regularisation terms to our model.

  • Regularization of the model parameters. This is a common \(\ell_2\) regularization term on the embedding matrices, given by \(r(U, V) = \frac{1}{N} \sum_i \|U_i\|^2 + \frac{1}{M}\sum_j \|V_j\|^2\).

  • A global prior that pushes the prediction of any pair towards zero, called the gravity term. This is given by \(g(U, V) = \frac{1}{MN} \sum_{i = 1}^N \sum_{j = 1}^M \langle U_i, V_j \rangle^2\)

These terms modifies the “global” loss (as in, the sum of the network loss and the regularization loss) in order to drive the optimization algorithm in desired directions i.e. prevent overfitting.

Evaluating the embeddings

We will use two similairty meausres to inspect the robustness of our system:

  • Dot product: score of artist j \(\langle u, V_j \rangle\).

  • Cosine angle: score of artist j \(\frac{\langle u, V_j \rangle}{\|u\|\|V_j\|}\).

DOT = 'dot'
COSINE = 'cosine'
def compute_scores(query_embedding, item_embeddings, measure=DOT):
  """Computes the scores of the candidates given a query.
  Args:
    query_embedding: a vector of shape [k], representing the query embedding.
    item_embeddings: a matrix of shape [N, k], such that row i is the embedding
      of item i.
    measure: a string specifying the similarity measure to be used. Can be
      either DOT or COSINE.
  Returns:
    scores: a vector of shape [N], such that scores[i] is the score of item i.
  """
  u = query_embedding
  V = item_embeddings
  if measure == COSINE:
    V = V / np.linalg.norm(V, axis=1, keepdims=True)
    u = u / np.linalg.norm(u)
  scores = u.dot(V.T)
  return scores
def user_recommendations(model,user_id, k=15, measure=DOT, exclude_rated=False):
    scores = compute_scores(
        model.embeddings["userID"][user_id], model.embeddings["artistID"], measure)
    score_key = measure + ' score'
    df = pd.DataFrame({
        'score': list(scores),
        'name': artists.sort_values('artistID', ascending=True)['name'],
        'most assigned tag':artists.sort_values('artistID', ascending=True)['mostCommonGenre']
    })
    return df.sort_values(['score'], ascending=False).head(k)


def artist_neighbors(model, title_substring, measure=DOT, k=6):
  # Search for artist ids that match the given substring.
  inv_artist_id_mapping = {v: k for k, v in orginal_artist_ids.items()}
  ids =  artists[artists['name'].str.contains(title_substring)].artistID.values
  titles = artists[artists.artistID.isin(ids)]['name'].values
  if len(titles) == 0:
    raise ValueError("Found no artists with name %s" % title_substring)
  print("Nearest neighbors of : %s." % titles[0])
  if len(titles) > 1:
    print("[Found more than one matching artist. Other candidates: {}]".format(
        ", ".join(titles[1:])))
  artists_id_orginal = ids[0]
  asrtists_id_mapped = inv_artist_id_mapping[ids[0]]
  scores = compute_scores(
      model.embeddings["artistID"][asrtists_id_mapped], model.embeddings["artistID"],
      measure)
  score_key = measure + ' score'
  df = pd.DataFrame({
      score_key: list(scores),
      'name': artists.sort_values('artistID', ascending=True)['name'],
      'most assigned tag':artists.sort_values('artistID', ascending=True)['mostCommonGenre']
  })
  return df.sort_values([score_key], ascending=False).head(k)

Here, we find the most similar artists to the band the cure. We also include the most assigned tag associated with an artist. The reccomdations are conistent with our domain knowedge of bands similar to the cure.

artist_neighbors(vanilla_model, "The Cure", DOT)
Nearest neighbors of : The Cure.
dot score name most assigned tag
89473 0.549 O.S.T.R. hip-hop
9437 0.544 The Cure chillout
15847 0.531 Red Hot Chili Peppers chillout
3259 0.530 Coldplay chillout
18364 0.530 Nirvana pop
71153 0.530 Elvis Presley electronic
artist_neighbors(vanilla_model, "The Cure", COSINE)
Nearest neighbors of : The Cure.
cosine score name most assigned tag
9437 1.000 The Cure chillout
8273 0.970 Radiohead chillout
4936 0.964 Depeche Mode chillout
16680 0.961 The Beatles chillout
43413 0.956 David Bowie chillout
32942 0.956 The Smiths groove
artist_neighbors(reg_model, "The Cure", DOT)
Nearest neighbors of : The Cure.
dot score name most assigned tag
16680 3.245 The Beatles chillout
18364 3.213 Nirvana pop
15847 3.171 Red Hot Chili Peppers chillout
12363 3.169 Muse chillout
9437 3.157 The Cure chillout
3259 3.139 Coldplay chillout
artist_neighbors(reg_model, "The Cure", COSINE)
Nearest neighbors of : The Cure.
cosine score name most assigned tag
9437 1.000 The Cure chillout
4936 0.968 Depeche Mode chillout
40639 0.962 Oasis pop
32942 0.962 The Smiths groove
43413 0.959 David Bowie chillout
16680 0.957 The Beatles chillout

We observe that dot product tends to recommends more popular artists such as Nirvana and The Beatles, where as Cosine Similarity recommends more obscure artists. This is likely due to the fact that the norm of the embedding in matrix factorization is often correlated with prevalence. The regularised model seems to output better reccomodations as the varation of the most assigned tag attribute is less when compared to the vanilla model. In addition, Marilyn Manson was recommended by the vanilla model in our intial run. We argue that these artists are most dis-similar! However, this observation is subject to change when you run the model, as we initialize the embedddings with a random gaussian generator.

def artist_embedding_norm(models):
  """Visualizes the norm and number of ratings of the artist embeddings.
  Args:
    model: A train_matrix_norm object.
  """
  if not isinstance(models, list):
    models = [models]
    df = pd.DataFrame({
          'name': artists.sort_values('artistID', ascending=True)['name'].values,
        'number of user-artist interactions': user_artists[['artistID','userID']].sort_values('artistID', ascending=True).groupby('artistID').count()['userID'].values,
    })
  charts = []
  brush = alt.selection_interval()
  for i, model in enumerate(models):
    norm_key = 'norm'+str(i)
    df[norm_key] = np.linalg.norm(model.embeddings["artistID"], axis=1)
    nearest = alt.selection(
        type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
        empty='none')
    base = alt.Chart().mark_circle().encode(
        x='number of user-artist interactions',
        y=norm_key,
        color=alt.condition(brush, alt.value('#4c78a8'), alt.value('lightgray'))
    ).properties(
        selection=nearest).add_selection(brush)
    text = alt.Chart().mark_text(align='center', dx=5, dy=-5).encode(
        x='number of user-artist interactions', y=norm_key,
        text=alt.condition(nearest, 'name', alt.value('')))
    charts.append(alt.layer(base, text))
  return alt.hconcat(*charts, data=df)

artist_embedding_norm(reg_model)
def visualize_movie_embeddings(data, x, y):
  genre_filter = alt.selection_multi(fields=['top10TagValue'])
  genre_chart = alt.Chart().mark_bar().encode(
      x="count()",
      y=alt.Y('top10TagValue'),
      color=alt.condition(
          genre_filter,
          alt.Color("top10TagValue:N"),
          alt.value('lightgray'))
  ).properties(height=300, selection=genre_filter)
  nearest = alt.selection(
      type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
      empty='none')
  base = alt.Chart().mark_circle().encode(
      x=x,
      y=y,
      color=alt.condition(genre_filter, "top10TagValue", alt.value("whitesmoke")),
  ).properties(
      width=600,
      height=600,
      selection=nearest)
  text = alt.Chart().mark_text(align='left', dx=5, dy=-5).encode(
      x=x,
      y=y,
      text=alt.condition(nearest, 'name', alt.value('')))
  return alt.hconcat(alt.layer(base, text), genre_chart, data=data)

def tsne_movie_embeddings(model):
  """Visualizes the movie embeddings, projected using t-SNE with Cosine measure.
  Args:
    model: A MFModel object.
  """
  tsne = sklearn.manifold.TSNE(
      n_components=2, perplexity=40, metric='cosine', early_exaggeration=10.0,
      init='pca', verbose=True, n_iter=400)

  print('Running t-SNE...')
  V_proj = tsne.fit_transform(model.embeddings["artistID"])
  artists.loc[:,'x'] = V_proj[:, 0]
  artists.loc[:,'y'] = V_proj[:, 1]
  return visualize_movie_embeddings(artists, 'x', 'y')

T-distributed stochastic neighbor embedding (t-SNE) is a dimensionality reduction algorithm useful for visualizing high dimensional data. We use this algorithim to visualise our embeddings of the regualrised model. Due to the large number of user submitted semantic categories, we decide to color-code the top 15 tags, with the rest being labelled as ‘N/A’. Although the sea of orange, indicating’N/A’, makes it difficult to interrupt these results, the regularised model seems to adequaltly cluster artists of a similar genre in it’s embeddings.

tsne_movie_embeddings(reg_model)
Running t-SNE...
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 17632 samples in 0.001s...
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:827: FutureWarning: 'square_distances' has been introduced in 0.24 to help phase out legacy squaring behavior. The 'legacy' setting will be removed in 1.1 (renaming of 0.26), and the default setting will be changed to True. In 1.3, 'square_distances' will be removed altogether, and distances will be squared by default. Set 'square_distances'=True to silence this warning.
  FutureWarning,
[t-SNE] Computed neighbors for 17632 samples in 5.127s...
[t-SNE] Computed conditional probabilities for sample 1000 / 17632
[t-SNE] Computed conditional probabilities for sample 2000 / 17632
[t-SNE] Computed conditional probabilities for sample 3000 / 17632
[t-SNE] Computed conditional probabilities for sample 4000 / 17632
[t-SNE] Computed conditional probabilities for sample 5000 / 17632
[t-SNE] Computed conditional probabilities for sample 6000 / 17632
[t-SNE] Computed conditional probabilities for sample 7000 / 17632
[t-SNE] Computed conditional probabilities for sample 8000 / 17632
[t-SNE] Computed conditional probabilities for sample 9000 / 17632
[t-SNE] Computed conditional probabilities for sample 10000 / 17632
[t-SNE] Computed conditional probabilities for sample 11000 / 17632
[t-SNE] Computed conditional probabilities for sample 12000 / 17632
[t-SNE] Computed conditional probabilities for sample 13000 / 17632
[t-SNE] Computed conditional probabilities for sample 14000 / 17632
[t-SNE] Computed conditional probabilities for sample 15000 / 17632
[t-SNE] Computed conditional probabilities for sample 16000 / 17632
[t-SNE] Computed conditional probabilities for sample 17000 / 17632
[t-SNE] Computed conditional probabilities for sample 17632 / 17632
[t-SNE] Mean sigma: 0.178284
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:986: FutureWarning: The PCA initialization in TSNE will change to have the standard deviation of PC1 equal to 1e-4 in 1.2. This will ensure better convergence.
  FutureWarning,
[t-SNE] KL divergence after 250 iterations with early exaggeration: 77.287689
[t-SNE] KL divergence after 400 iterations: 2.794723
def m_embedding_norm(models):
  """Visualizes the norm and number of ratings of the movie embeddings.
  Args:
    model: A MFModel object.
  """
  if not isinstance(models, list):
    models = [models]
    df = pd.DataFrame({
          'title': artists.sort_values('artistID', ascending=True)['name'].values,
        'num_ratings': user_artists[['artistID','userID']].sort_values('artistID', ascending=True).groupby('artistID').count()['userID'].values,
    })
  charts = []
  brush = alt.selection_interval()
  for i, model in enumerate(models):
    norm_key = 'norm'+str(i)
    df[norm_key] = np.linalg.norm(model.embeddings["artistID"], axis=1)
    nearest = alt.selection(
        type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
        empty='none')
    base = alt.Chart().mark_circle().encode(
        x='num_ratings',
        y=norm_key,
        color=alt.condition(brush, alt.value('#4c78a8'), alt.value('lightgray'))
    ).properties(
        selection=nearest).add_selection(brush)
    text = alt.Chart().mark_text(align='center', dx=5, dy=-5).encode(
        x='num_ratings', y=norm_key,
        text=alt.condition(nearest, 'title', alt.value('')))
    charts.append(alt.layer(base, text))
  return alt.hconcat(*charts, data=df)

Demo

You can find the most similar artist to a specified artist (that is contained in Last.FM) using the artist_neighbours() function. Similarily, you can find the top 10 recommendations of a particular userID [0 to 1891] using the user_recommendations() function. The first argument specifies the desired model, second argument the userID and third the top-k recommendations. Fourth argument represents the similarity measure, either DOT or COSINE (default = DOT, not a string).

user_recommendations(reg_model, 234, 10, COSINE)
score name most assigned tag
126513 0.901 Graforréia Xilarmônica rock
126491 0.901 Bandas Gaúchas - www.DownsMtv.com N/A
126582 0.893 Validuaté N/A
126554 0.884 Moreira da Silva N/A
126539 0.884 Menstruação Anarquika N/A
126400 0.857 The Vibrators punk
126490 0.811 Street Bulldogs N/A
123515 0.807 Forgotten Boys rock
126451 0.804 Tim Maia pop
180048 0.804 Paul & Linda McCartney rock

To further demonstrate the robustness of the system and measure the serendipity of our model, we incorporate the top artists that we listen to on Spotify (i.e. an unknown user). Note, these artists have to also be in the Last.FM dataset. The recommendation system should output similar artists based on it’s artist embeddings. The Spotipy library is used to interact with Spotify’s API. The similarity measure used is the Dot product. Due to the short lived nature of the spotify token and the fact you have to sign into a pop-up to retrieve the authentication token, we simply list our top 5 artists manually. If we did not, jupyter book will stall when attempting to build as it is waiting for our response. However, we provide the code used to retrieve the short-lived token for verification purposes.

"""
import spotipy
from spotipy.oauth2 import SpotifyOAuth
client_id = <insert_your_client_id>
client_secret = <insert your client secret>
redirect_url = '<insert your redirect uri>
scope = "user-top-read user-read-playback-state streaming ugc-image-upload playlist-modify-public"

authenticate_manager = spotipy.oauth2.SpotifyOAuth(client_id = client_id,client_secret = client_secret,redirect_uri =redirect_url,scope =scope,show_dialog = True)
sp = spotipy.Spotify(auth_manager=authenticate_manager)

artists_long = sp.current_user_top_artists(limit=5, time_range="long_term")
"""
top_5_artists =[
                 'Coldplay',
                 'Paramore',
                 'Arctic Monkeys',
                 'Lily Allen',
                 'Miley Cyrus'
]
spotify_reccomdations_df = pd.DataFrame()
for artist in top_5_artists:
  similar_artist_df = artist_neighbors(reg_model, artist)[['name','dot score']]
  spotify_reccomdations_df = pd.concat([spotify_reccomdations_df, similar_artist_df])
spotify_reccomdations_df.sort_values('dot score', ascending=False).head(10)
Nearest neighbors of : Coldplay.
[Found more than one matching artist. Other candidates: Jay-Z & Coldplay, Coldplay/U2]
Nearest neighbors of : Paramore.
[Found more than one matching artist. Other candidates: Paramore攀]
Nearest neighbors of : Arctic Monkeys.
[Found more than one matching artist. Other candidates: Arctic Monkeys vs The Killers]
Nearest neighbors of : Lily Allen.
Nearest neighbors of : Miley Cyrus.
[Found more than one matching artist. Other candidates: Miley Cyrus攀, Demi Lovato Ft. Miley Cyrus Ft. Selena Gomez Ft. Jonas Brothers, Miley Cyrus and Billy Ray Cyrus, Miley Cyrus and John Travolta, Hannah Montana and Miley Cyrus]
name dot score
3259 Coldplay 3.641
12363 Muse 3.536
37842 Paramore 3.503
24447 Lily Allen 3.489
6543 Lady Gaga 3.462
6543 Lady Gaga 3.460
36290 Eminem 3.452
17472 The Killers 3.441
36290 Eminem 3.440
17278 Kings of Leon 3.437

We believe these recommodations are good as when our model was given an artist in the top five, it actually recommended other artits in the top five.

Evaluation Code

This is the code needed to produce the in-depth model comparison. As we decided to use different notebooks for different models, the results of this code will be combined and explained later in the book.

## create holdout test set for each user (15 items)
user_artists = pd.read_csv('data/user_artists.dat', sep='\t')
user_ids = []
holdout_artits = []
for user_id in user_artists.userID.unique():
  top_15_artists = user_artists[user_artists.userID == user_id].sort_values(by='weight').head(15).artistID.tolist()
  if len(top_15_artists) == 15:
    holdout_artits.append(top_15_artists)
    user_ids.append(user_id)
holdout_df = pd.DataFrame(data={'userID':user_ids,'holdout_artists':holdout_artits})

holdout_df.to_csv('data/evaluation/test-set.csv',index=False)
## Finding the models vanilla, regualrised predection for each user. 
def get_top_15_model_predictions(model, measure):
  """Computes the top 15 predictions for a given model
  Args:
    model: the name of the model
    measure: a string specifying the similarity measure to be used. Can be
      either DOT or COSINE.
  Returns:
    predicted_df a dataframe containing userIDs, their top 15 artists by the model, and the correspnding scores.
  """
  artist_name_id_dict = dict(zip(artists['name'], artists['artistID']))
  user_ids = []
  predicted_artists = []
  scores_list = []
  for new_user_id, orginal_user_id in orginal_user_ids.items():
    top_15_names = user_recommendations(model, new_user_id, k=15,measure=measure )['name'].values
    top_15_scores = user_recommendations(model, new_user_id, k=15, measure=measure )['score'].values.tolist()
    artist_ids = []
    for name in top_15_names:
      artist_ids.append(artist_name_id_dict[name])
    predicted_artists.append(artist_ids)
    user_ids.append(orginal_user_id)
    scores_list.append(top_15_scores)
  predicted_df = pd.DataFrame(data={'userID':user_ids,'predictions_artists':predicted_artists, 'score':scores_list })
  return predicted_df
# save the recommended artits into dfs and save them to data/evaluation folder
vanilla_dot_pred= get_top_15_model_predictions(vanilla_model, measure=DOT)
vanilla_cos_pred = get_top_15_model_predictions(vanilla_model, measure=COSINE)
reg_dot_pred= get_top_15_model_predictions(reg_model, measure=DOT)
reg_cos_pred = get_top_15_model_predictions(reg_model, measure=COSINE)

vanilla_dot_pred.to_csv('data/evaluation/vannila_dot_pred.csv',index=False)
vanilla_cos_pred.to_csv('data/evaluation/vanila_cos_pred.csv',index=False)
reg_dot_pred.to_csv('data/evaluation/reg_dot_pred.csv',index=False)
reg_cos_pred.to_csv('data/evaluation/reg_cos_pred.csv',index=False)